Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 16554 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.6 MiB |
| Average record size in memory | 104.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 7 |
grade is highly correlated with bathrooms and 4 other fields | High correlation |
bathrooms is highly correlated with grade and 2 other fields | High correlation |
bedrooms is highly correlated with grade and 2 other fields | High correlation |
sqft_living15 is highly correlated with grade and 2 other fields | High correlation |
floors is highly correlated with antiguedad_venta | High correlation |
sqft_lot is highly correlated with sqft_lot15 | High correlation |
price is highly correlated with grade and 1 other fields | High correlation |
sqft_lot15 is highly correlated with zipcode and 2 other fields | High correlation |
sqft_living is highly correlated with grade and 4 other fields | High correlation |
antiguedad_venta is highly correlated with zipcode and 4 other fields | High correlation |
view is highly correlated with waterfront | High correlation |
waterfront is highly correlated with view | High correlation |
zipcode is highly correlated with sqft_lot15 and 1 other fields | High correlation |
condition is highly correlated with antiguedad_venta | High correlation |
df_index has unique values | Unique |
antiguedad_venta has 320 (1.9%) zeros | Zeros |
Reproduction
| Analysis started | 2022-10-02 21:55:15.270504 |
|---|---|
| Analysis finished | 2022-10-02 21:55:32.535428 |
| Duration | 17.26 seconds |
| Software version | pandas-profiling v3.3.0 |
| Download configuration | config.json |
| Distinct | 16554 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18811.53262 |
| Minimum | 1 |
|---|---|
| Maximum | 113866 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 129.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1130.65 |
| Q1 | 6188.5 |
| median | 14556.5 |
| Q3 | 27262.5 |
| 95-th percentile | 51274.25 |
| Maximum | 113866 |
| Range | 113865 |
| Interquartile range (IQR) | 21074 |
Descriptive statistics
| Standard deviation | 16147.10472 |
|---|---|
| Coefficient of variation (CV) | 0.8583619978 |
| Kurtosis | 1.628791243 |
| Mean | 18811.53262 |
| Median Absolute Deviation (MAD) | 9673.5 |
| Skewness | 1.265459165 |
| Sum | 311406111 |
| Variance | 260728990.9 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 19857 | 1 | < 0.1% |
| 41861 | 1 | < 0.1% |
| 27591 | 1 | < 0.1% |
| 24402 | 1 | < 0.1% |
| 5679 | 1 | < 0.1% |
| 51523 | 1 | < 0.1% |
| 23241 | 1 | < 0.1% |
| 6184 | 1 | < 0.1% |
| 22233 | 1 | < 0.1% |
| 6765 | 1 | < 0.1% |
| Other values (16544) | 16544 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 15 | 1 |
| Value | Count | Frequency (%) |
| 113866 | 1 | |
| 111906 | 1 | |
| 109571 | 1 | |
| 108311 | 1 | |
| 99297 | 1 | |
| 98195 | 1 | |
| 98015 | 1 | |
| 94768 | 1 | |
| 94083 | 1 | |
| 93739 | 1 |
| Distinct | 70 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 98079.01027 |
| Minimum | 98001 |
|---|---|
| Maximum | 98199 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 64.8 KiB |
Quantile statistics
| Minimum | 98001 |
|---|---|
| 5-th percentile | 98004 |
| Q1 | 98033 |
| median | 98070 |
| Q3 | 98118 |
| 95-th percentile | 98177 |
| Maximum | 98199 |
| Range | 198 |
| Interquartile range (IQR) | 85 |
Descriptive statistics
| Standard deviation | 53.58031141 |
|---|---|
| Coefficient of variation (CV) | 0.0005462974317 |
| Kurtosis | -0.88492992 |
| Mean | 98079.01027 |
| Median Absolute Deviation (MAD) | 43 |
| Skewness | 0.3725473414 |
| Sum | 1623599936 |
| Variance | 2870.849771 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 98103 | 477 | 2.9% |
| 98115 | 476 | 2.9% |
| 98052 | 467 | 2.8% |
| 98034 | 446 | 2.7% |
| 98117 | 442 | 2.7% |
| 98038 | 431 | 2.6% |
| 98042 | 431 | 2.6% |
| 98118 | 406 | 2.5% |
| 98133 | 402 | 2.4% |
| 98023 | 399 | 2.4% |
| Other values (60) | 12177 |
| Value | Count | Frequency (%) |
| 98001 | 291 | |
| 98002 | 160 | |
| 98003 | 213 | |
| 98004 | 205 | |
| 98005 | 132 | 0.8% |
| 98006 | 377 | |
| 98007 | 113 | 0.7% |
| 98008 | 226 | |
| 98010 | 69 | 0.4% |
| 98011 | 150 | 0.9% |
| Value | Count | Frequency (%) |
| 98199 | 234 | |
| 98198 | 220 | |
| 98188 | 108 | 0.7% |
| 98178 | 211 | |
| 98177 | 192 | |
| 98168 | 218 | |
| 98166 | 200 | |
| 98155 | 372 | |
| 98148 | 51 | 0.3% |
| 98146 | 228 |
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.583967621 |
| Minimum | 1 |
|---|---|
| Maximum | 13 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 64.8 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 7 |
| median | 7 |
| Q3 | 8 |
| 95-th percentile | 10 |
| Maximum | 13 |
| Range | 12 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.103965081 |
|---|---|
| Coefficient of variation (CV) | 0.1455656374 |
| Kurtosis | 1.371933156 |
| Mean | 7.583967621 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.7165229669 |
| Sum | 125545 |
| Variance | 1.2187389 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=12)
| Value | Count | Frequency (%) |
| 7 | 7145 | |
| 8 | 4770 | |
| 9 | 1881 | 11.4% |
| 6 | 1592 | 9.6% |
| 10 | 698 | 4.2% |
| 11 | 211 | 1.3% |
| 5 | 183 | 1.1% |
| 12 | 40 | 0.2% |
| 4 | 23 | 0.1% |
| 13 | 7 | < 0.1% |
| Other values (2) | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 3 | 3 | < 0.1% |
| 4 | 23 | 0.1% |
| 5 | 183 | 1.1% |
| 6 | 1592 | 9.6% |
| 7 | 7145 | |
| 8 | 4770 | |
| 9 | 1881 | 11.4% |
| 10 | 698 | 4.2% |
| 11 | 211 | 1.3% |
| Value | Count | Frequency (%) |
| 13 | 7 | < 0.1% |
| 12 | 40 | 0.2% |
| 11 | 211 | 1.3% |
| 10 | 698 | 4.2% |
| 9 | 1881 | 11.4% |
| 8 | 4770 | |
| 7 | 7145 | |
| 6 | 1592 | 9.6% |
| 5 | 183 | 1.1% |
| 4 | 23 | 0.1% |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 937.8 KiB |
| 0 | |
|---|---|
| 2 | 666 |
| 3 | 324 |
| 1 | 251 |
| 4 | 190 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 16554 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 15123 | |
| 2 | 666 | 4.0% |
| 3 | 324 | 2.0% |
| 1 | 251 | 1.5% |
| 4 | 190 | 1.1% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 15123 | |
| 2 | 666 | 4.0% |
| 3 | 324 | 2.0% |
| 1 | 251 | 1.5% |
| 4 | 190 | 1.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 15123 | |
| 2 | 666 | 4.0% |
| 3 | 324 | 2.0% |
| 1 | 251 | 1.5% |
| 4 | 190 | 1.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 16554 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 15123 | |
| 2 | 666 | 4.0% |
| 3 | 324 | 2.0% |
| 1 | 251 | 1.5% |
| 4 | 190 | 1.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 16554 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 15123 | |
| 2 | 666 | 4.0% |
| 3 | 324 | 2.0% |
| 1 | 251 | 1.5% |
| 4 | 190 | 1.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 16554 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 15123 | |
| 2 | 666 | 4.0% |
| 3 | 324 | 2.0% |
| 1 | 251 | 1.5% |
| 4 | 190 | 1.1% |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 970.1 KiB |
| 2.0 | |
|---|---|
| 1.0 | |
| 3.0 | |
| 4.0 | 191 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 49662 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 2.0 |
| 4th row | 1.0 |
| 5th row | 2.0 |
Common Values
| Value | Count | Frequency (%) |
| 2.0 | 8224 | |
| 1.0 | 6635 | |
| 3.0 | 1504 | 9.1% |
| 4.0 | 191 | 1.2% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 2.0 | 8224 | |
| 1.0 | 6635 | |
| 3.0 | 1504 | 9.1% |
| 4.0 | 191 | 1.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 16554 | |
| 0 | 16554 | |
| 2 | 8224 | |
| 1 | 6635 | |
| 3 | 1504 | 3.0% |
| 4 | 191 | 0.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 33108 | |
| Other Punctuation | 16554 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 16554 | |
| 2 | 8224 | |
| 1 | 6635 | |
| 3 | 1504 | 4.5% |
| 4 | 191 | 0.6% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 16554 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 49662 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 16554 | |
| 0 | 16554 | |
| 2 | 8224 | |
| 1 | 6635 | |
| 3 | 1504 | 3.0% |
| 4 | 191 | 0.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 49662 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 16554 | |
| 0 | 16554 | |
| 2 | 8224 | |
| 1 | 6635 | |
| 3 | 1504 | 3.0% |
| 4 | 191 | 0.4% |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 970.1 KiB |
| 3.0 | |
|---|---|
| 4.0 | |
| 2.0 | |
| 5.0 | |
| 1.0 | 161 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 49662 |
|---|---|
| Distinct characters | 7 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 3.0 |
|---|---|
| 2nd row | 3.0 |
| 3rd row | 4.0 |
| 4th row | 5.0 |
| 5th row | 3.0 |
Common Values
| Value | Count | Frequency (%) |
| 3.0 | 7846 | |
| 4.0 | 5208 | |
| 2.0 | 2163 | 13.1% |
| 5.0 | 1176 | 7.1% |
| 1.0 | 161 | 1.0% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 3.0 | 7846 | |
| 4.0 | 5208 | |
| 2.0 | 2163 | 13.1% |
| 5.0 | 1176 | 7.1% |
| 1.0 | 161 | 1.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 16554 | |
| 0 | 16554 | |
| 3 | 7846 | |
| 4 | 5208 | 10.5% |
| 2 | 2163 | 4.4% |
| 5 | 1176 | 2.4% |
| 1 | 161 | 0.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 33108 | |
| Other Punctuation | 16554 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 16554 | |
| 3 | 7846 | |
| 4 | 5208 | 15.7% |
| 2 | 2163 | 6.5% |
| 5 | 1176 | 3.6% |
| 1 | 161 | 0.5% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 16554 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 49662 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 16554 | |
| 0 | 16554 | |
| 3 | 7846 | |
| 4 | 5208 | 10.5% |
| 2 | 2163 | 4.4% |
| 5 | 1176 | 2.4% |
| 1 | 161 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 49662 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 16554 | |
| 0 | 16554 | |
| 3 | 7846 | |
| 4 | 5208 | 10.5% |
| 2 | 2163 | 4.4% |
| 5 | 1176 | 2.4% |
| 1 | 161 | 0.3% |
| Distinct | 679 |
|---|---|
| Distinct (%) | 4.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1944.15084 |
| Minimum | 460 |
|---|---|
| Maximum | 5790 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 129.5 KiB |
Quantile statistics
| Minimum | 460 |
|---|---|
| 5-th percentile | 1130 |
| Q1 | 1470 |
| median | 1810 |
| Q3 | 2300 |
| 95-th percentile | 3190 |
| Maximum | 5790 |
| Range | 5330 |
| Interquartile range (IQR) | 830 |
Descriptive statistics
| Standard deviation | 649.2374266 |
|---|---|
| Coefficient of variation (CV) | 0.3339439581 |
| Kurtosis | 1.455088632 |
| Mean | 1944.15084 |
| Median Absolute Deviation (MAD) | 390 |
| Skewness | 1.059620425 |
| Sum | 32183473 |
| Variance | 421509.2361 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1440 | 160 | 1.0% |
| 1560 | 159 | 1.0% |
| 1540 | 155 | 0.9% |
| 1500 | 150 | 0.9% |
| 1460 | 144 | 0.9% |
| 1720 | 138 | 0.8% |
| 1580 | 137 | 0.8% |
| 1480 | 134 | 0.8% |
| 1610 | 134 | 0.8% |
| 1520 | 134 | 0.8% |
| Other values (669) | 15109 |
| Value | Count | Frequency (%) |
| 460 | 1 | < 0.1% |
| 620 | 2 | < 0.1% |
| 670 | 1 | < 0.1% |
| 690 | 2 | < 0.1% |
| 700 | 2 | < 0.1% |
| 710 | 1 | < 0.1% |
| 720 | 2 | < 0.1% |
| 740 | 5 | |
| 750 | 1 | < 0.1% |
| 760 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 5790 | 5 | |
| 5600 | 1 | < 0.1% |
| 5380 | 1 | < 0.1% |
| 5330 | 1 | < 0.1% |
| 5220 | 1 | < 0.1% |
| 5080 | 1 | < 0.1% |
| 5070 | 1 | < 0.1% |
| 4950 | 1 | < 0.1% |
| 4930 | 1 | < 0.1% |
| 4913 | 1 | < 0.1% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 937.8 KiB |
| 0 | |
|---|---|
| 1 | 95 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 16554 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 16459 | |
| 1 | 95 | 0.6% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 16459 | |
| 1 | 95 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 16459 | |
| 1 | 95 | 0.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 16554 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 16459 | |
| 1 | 95 | 0.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 16554 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 16459 | |
| 1 | 95 | 0.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 16554 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 16459 | |
| 1 | 95 | 0.6% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 970.1 KiB |
| 1.0 | |
|---|---|
| 2.0 | |
| 3.0 | 489 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 49662 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 2.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 9811 | |
| 2.0 | 6254 | |
| 3.0 | 489 | 3.0% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1.0 | 9811 | |
| 2.0 | 6254 | |
| 3.0 | 489 | 3.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 16554 | |
| 0 | 16554 | |
| 1 | 9811 | |
| 2 | 6254 | 12.6% |
| 3 | 489 | 1.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 33108 | |
| Other Punctuation | 16554 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 16554 | |
| 1 | 9811 | |
| 2 | 6254 | 18.9% |
| 3 | 489 | 1.5% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 16554 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 49662 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 16554 | |
| 0 | 16554 | |
| 1 | 9811 | |
| 2 | 6254 | 12.6% |
| 3 | 489 | 1.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 49662 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 16554 | |
| 0 | 16554 | |
| 1 | 9811 | |
| 2 | 6254 | 12.6% |
| 3 | 489 | 1.0% |
| Distinct | 7871 |
|---|---|
| Distinct (%) | 47.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9934.879667 |
| Minimum | 520 |
|---|---|
| Maximum | 137214 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 129.5 KiB |
Quantile statistics
| Minimum | 520 |
|---|---|
| 5-th percentile | 1715.65 |
| Q1 | 5000 |
| median | 7480 |
| Q3 | 10140 |
| 95-th percentile | 29944 |
| Maximum | 137214 |
| Range | 136694 |
| Interquartile range (IQR) | 5140 |
Descriptive statistics
| Standard deviation | 10957.36316 |
|---|---|
| Coefficient of variation (CV) | 1.102918559 |
| Kurtosis | 31.24093454 |
| Mean | 9934.879667 |
| Median Absolute Deviation (MAD) | 2507.5 |
| Skewness | 4.698176897 |
| Sum | 164461998 |
| Variance | 120063807.5 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 5000 | 289 | 1.7% |
| 6000 | 221 | 1.3% |
| 4000 | 197 | 1.2% |
| 7200 | 174 | 1.1% |
| 4800 | 97 | 0.6% |
| 7500 | 95 | 0.6% |
| 4500 | 93 | 0.6% |
| 9600 | 91 | 0.5% |
| 8400 | 88 | 0.5% |
| 3600 | 79 | 0.5% |
| Other values (7861) | 15130 |
| Value | Count | Frequency (%) |
| 520 | 1 | |
| 600 | 1 | |
| 609 | 1 | |
| 635 | 1 | |
| 638 | 1 | |
| 649 | 2 | |
| 651 | 1 | |
| 676 | 1 | |
| 681 | 1 | |
| 683 | 1 |
| Value | Count | Frequency (%) |
| 137214 | 1 | |
| 136915 | 1 | |
| 136778 | 1 | |
| 136290 | 1 | |
| 130680 | 1 | |
| 130017 | 1 | |
| 127631 | 1 | |
| 125452 | 1 | |
| 122038 | 1 | |
| 120661 | 1 |
| Distinct | 3237 |
|---|---|
| Distinct (%) | 19.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 510603.4272 |
| Minimum | 75000 |
|---|---|
| Maximum | 7700000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 129.5 KiB |
Quantile statistics
| Minimum | 75000 |
|---|---|
| 5-th percentile | 210000 |
| Q1 | 316000 |
| median | 440000 |
| Q3 | 620000 |
| 95-th percentile | 963000 |
| Maximum | 7700000 |
| Range | 7625000 |
| Interquartile range (IQR) | 304000 |
Descriptive statistics
| Standard deviation | 323805.8088 |
|---|---|
| Coefficient of variation (CV) | 0.634163015 |
| Kurtosis | 50.59509917 |
| Mean | 510603.4272 |
| Median Absolute Deviation (MAD) | 141000 |
| Skewness | 4.658301218 |
| Sum | 8452529134 |
| Variance | 1.048502018 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 350000 | 137 | 0.8% |
| 450000 | 133 | 0.8% |
| 425000 | 128 | 0.8% |
| 550000 | 126 | 0.8% |
| 500000 | 123 | 0.7% |
| 375000 | 116 | 0.7% |
| 325000 | 114 | 0.7% |
| 300000 | 109 | 0.7% |
| 400000 | 108 | 0.7% |
| 250000 | 105 | 0.6% |
| Other values (3227) | 15355 |
| Value | Count | Frequency (%) |
| 75000 | 1 | |
| 78000 | 1 | |
| 80000 | 1 | |
| 81000 | 1 | |
| 82000 | 1 | |
| 82500 | 1 | |
| 83000 | 1 | |
| 84000 | 1 | |
| 85000 | 2 | |
| 89000 | 1 |
| Value | Count | Frequency (%) |
| 7700000 | 1 | |
| 7062500 | 1 | |
| 5570000 | 1 | |
| 5300000 | 1 | |
| 5110800 | 1 | |
| 4500000 | 1 | |
| 4000000 | 1 | |
| 3850000 | 1 | |
| 3800000 | 2 | |
| 3710000 | 1 |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 937.8 KiB |
| 3 | |
|---|---|
| 4 | |
| 5 | |
| 2 | 126 |
| 1 | 23 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 16554 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 3 |
|---|---|
| 2nd row | 3 |
| 3rd row | 3 |
| 4th row | 3 |
| 5th row | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 10763 | |
| 4 | 4357 | |
| 5 | 1285 | 7.8% |
| 2 | 126 | 0.8% |
| 1 | 23 | 0.1% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 3 | 10763 | |
| 4 | 4357 | |
| 5 | 1285 | 7.8% |
| 2 | 126 | 0.8% |
| 1 | 23 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 10763 | |
| 4 | 4357 | |
| 5 | 1285 | 7.8% |
| 2 | 126 | 0.8% |
| 1 | 23 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 16554 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 10763 | |
| 4 | 4357 | |
| 5 | 1285 | 7.8% |
| 2 | 126 | 0.8% |
| 1 | 23 | 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 16554 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 10763 | |
| 4 | 4357 | |
| 5 | 1285 | 7.8% |
| 2 | 126 | 0.8% |
| 1 | 23 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 16554 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 10763 | |
| 4 | 4357 | |
| 5 | 1285 | 7.8% |
| 2 | 126 | 0.8% |
| 1 | 23 | 0.1% |
| Distinct | 7013 |
|---|---|
| Distinct (%) | 42.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8996.247856 |
| Minimum | 659 |
|---|---|
| Maximum | 57140 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 129.5 KiB |
Quantile statistics
| Minimum | 659 |
|---|---|
| 5-th percentile | 1921.95 |
| Q1 | 5011.25 |
| median | 7500 |
| Q3 | 9750 |
| 95-th percentile | 23045.85 |
| Maximum | 57140 |
| Range | 56481 |
| Interquartile range (IQR) | 4738.75 |
Descriptive statistics
| Standard deviation | 7636.814694 |
|---|---|
| Coefficient of variation (CV) | 0.848888872 |
| Kurtosis | 11.84745816 |
| Mean | 8996.247856 |
| Median Absolute Deviation (MAD) | 2400 |
| Skewness | 3.168057171 |
| Sum | 148923887 |
| Variance | 58320938.67 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 5000 | 339 | 2.0% |
| 4000 | 285 | 1.7% |
| 6000 | 230 | 1.4% |
| 7200 | 171 | 1.0% |
| 7500 | 111 | 0.7% |
| 4800 | 108 | 0.7% |
| 4500 | 94 | 0.6% |
| 8400 | 90 | 0.5% |
| 3600 | 89 | 0.5% |
| 4080 | 82 | 0.5% |
| Other values (7003) | 14955 |
| Value | Count | Frequency (%) |
| 659 | 1 | < 0.1% |
| 660 | 1 | < 0.1% |
| 748 | 1 | < 0.1% |
| 750 | 3 | |
| 755 | 1 | < 0.1% |
| 758 | 1 | < 0.1% |
| 794 | 1 | < 0.1% |
| 810 | 2 | |
| 886 | 3 | |
| 887 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 57140 | 1 | < 0.1% |
| 57063 | 2 | < 0.1% |
| 57000 | 1 | < 0.1% |
| 56827 | 1 | < 0.1% |
| 56628 | 6 | |
| 56568 | 1 | < 0.1% |
| 56192 | 2 | < 0.1% |
| 55657 | 1 | < 0.1% |
| 55322 | 1 | < 0.1% |
| 55023 | 1 | < 0.1% |
| Distinct | 871 |
|---|---|
| Distinct (%) | 5.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2016.595143 |
| Minimum | 290 |
|---|---|
| Maximum | 12050 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 129.5 KiB |
Quantile statistics
| Minimum | 290 |
|---|---|
| 5-th percentile | 920 |
| Q1 | 1400 |
| median | 1880 |
| Q3 | 2478.75 |
| 95-th percentile | 3560 |
| Maximum | 12050 |
| Range | 11760 |
| Interquartile range (IQR) | 1078.75 |
Descriptive statistics
| Standard deviation | 848.7053985 |
|---|---|
| Coefficient of variation (CV) | 0.4208605785 |
| Kurtosis | 4.451354919 |
| Mean | 2016.595143 |
| Median Absolute Deviation (MAD) | 520 |
| Skewness | 1.323768322 |
| Sum | 33382716 |
| Variance | 720300.8534 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1440 | 111 | 0.7% |
| 1400 | 110 | 0.7% |
| 1300 | 106 | 0.6% |
| 1540 | 102 | 0.6% |
| 1480 | 102 | 0.6% |
| 1010 | 99 | 0.6% |
| 1660 | 99 | 0.6% |
| 1200 | 99 | 0.6% |
| 1900 | 98 | 0.6% |
| 1560 | 98 | 0.6% |
| Other values (861) | 15530 |
| Value | Count | Frequency (%) |
| 290 | 1 | |
| 380 | 1 | |
| 390 | 1 | |
| 420 | 2 | |
| 430 | 1 | |
| 440 | 1 | |
| 470 | 2 | |
| 480 | 2 | |
| 490 | 1 | |
| 500 | 1 |
| Value | Count | Frequency (%) |
| 12050 | 1 | |
| 10040 | 1 | |
| 9200 | 1 | |
| 8020 | 1 | |
| 8010 | 1 | |
| 7710 | 1 | |
| 7620 | 1 | |
| 7480 | 1 | |
| 7390 | 1 | |
| 7350 | 1 |
fue_renovada
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 937.8 KiB |
| 0 | |
|---|---|
| 1 | 649 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 16554 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 15905 | |
| 1 | 649 | 3.9% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 15905 | |
| 1 | 649 | 3.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 15905 | |
| 1 | 649 | 3.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 16554 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 15905 | |
| 1 | 649 | 3.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 16554 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 15905 | |
| 1 | 649 | 3.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 16554 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 15905 | |
| 1 | 649 | 3.9% |
| Distinct | 117 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 43.64552374 |
| Minimum | -1 |
|---|---|
| Maximum | 115 |
| Zeros | 320 |
| Zeros (%) | 1.9% |
| Negative | 12 |
| Negative (%) | 0.1% |
| Memory size | 129.5 KiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 18 |
| median | 40 |
| Q3 | 63 |
| 95-th percentile | 99 |
| Maximum | 115 |
| Range | 116 |
| Interquartile range (IQR) | 45 |
Descriptive statistics
| Standard deviation | 29.34580409 |
|---|---|
| Coefficient of variation (CV) | 0.6723668678 |
| Kurtosis | -0.6696125664 |
| Mean | 43.64552374 |
| Median Absolute Deviation (MAD) | 23 |
| Skewness | 0.4480073194 |
| Sum | 722508 |
| Variance | 861.1762178 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 9 | 355 | 2.1% |
| 11 | 344 | 2.1% |
| 8 | 338 | 2.0% |
| 10 | 330 | 2.0% |
| 0 | 320 | 1.9% |
| 37 | 314 | 1.9% |
| 7 | 300 | 1.8% |
| 36 | 293 | 1.8% |
| 46 | 277 | 1.7% |
| 47 | 274 | 1.7% |
| Other values (107) | 13409 |
| Value | Count | Frequency (%) |
| -1 | 12 | 0.1% |
| 0 | 320 | |
| 1 | 220 | |
| 2 | 133 | 0.8% |
| 3 | 120 | 0.7% |
| 4 | 102 | 0.6% |
| 5 | 150 | |
| 6 | 250 | |
| 7 | 300 | |
| 8 | 338 |
| Value | Count | Frequency (%) |
| 115 | 21 | 0.1% |
| 114 | 47 | |
| 113 | 23 | 0.1% |
| 112 | 27 | 0.2% |
| 111 | 39 | |
| 110 | 42 | |
| 109 | 51 | |
| 108 | 64 | |
| 107 | 71 | |
| 106 | 52 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | zipcode | grade | view | bathrooms | bedrooms | sqft_living15 | waterfront | floors | sqft_lot | price | condition | sqft_lot15 | sqft_living | fue_renovada | antiguedad_venta | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 19857 | 98006 | 10 | 0 | 2.0 | 3.0 | 3140.0 | 0 | 2.0 | 8481.0 | 810000.0 | 3 | 10008.0 | 2610.0 | 0 | 22.0 |
| 1 | 14014 | 98033 | 8 | 1 | 1.0 | 3.0 | 2210.0 | 0 | 1.0 | 8955.0 | 685000.0 | 3 | 8976.0 | 2210.0 | 0 | 41.0 |
| 2 | 32909 | 98005 | 8 | 0 | 2.0 | 4.0 | 2230.0 | 0 | 2.0 | 18295.0 | 725000.0 | 3 | 19856.0 | 2650.0 | 0 | 28.0 |
| 3 | 16305 | 98001 | 7 | 0 | 1.0 | 5.0 | 1660.0 | 0 | 1.0 | 8720.0 | 274000.0 | 3 | 8030.0 | 1950.0 | 0 | 53.0 |
| 4 | 6647 | 98011 | 7 | 0 | 2.0 | 3.0 | 1620.0 | 0 | 1.0 | 6449.0 | 445000.0 | 3 | 7429.0 | 1630.0 | 0 | 29.0 |
| 5 | 5865 | 98040 | 8 | 0 | 2.0 | 4.0 | 2550.0 | 0 | 1.0 | 8760.0 | 762500.0 | 4 | 10376.0 | 2610.0 | 0 | 36.0 |
| 6 | 8009 | 98004 | 8 | 1 | 1.0 | 3.0 | 2630.0 | 0 | 1.0 | 14133.0 | 979000.0 | 4 | 17376.0 | 1700.0 | 0 | 60.0 |
| 7 | 4731 | 98011 | 8 | 0 | 3.0 | 5.0 | 2640.0 | 0 | 2.0 | 4369.0 | 540000.0 | 3 | 4610.0 | 2870.0 | 0 | 7.0 |
| 8 | 38480 | 98052 | 9 | 0 | 2.0 | 4.0 | 2730.0 | 0 | 2.0 | 8810.0 | 690000.0 | 3 | 5100.0 | 2700.0 | 0 | 10.0 |
| 9 | 13246 | 98072 | 7 | 0 | 1.0 | 3.0 | 1260.0 | 0 | 1.0 | 9673.0 | 375000.0 | 3 | 9681.0 | 1660.0 | 0 | 38.0 |
Last rows
| df_index | zipcode | grade | view | bathrooms | bedrooms | sqft_living15 | waterfront | floors | sqft_lot | price | condition | sqft_lot15 | sqft_living | fue_renovada | antiguedad_venta | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 16544 | 9302 | 98148 | 7 | 0 | 1.0 | 2.0 | 1890.0 | 0 | 1.0 | 6000.0 | 246500.0 | 2 | 8547.0 | 940.0 | 0 | 61.0 |
| 16545 | 26872 | 98008 | 6 | 0 | 1.0 | 3.0 | 1210.0 | 0 | 1.0 | 8000.0 | 475000.0 | 4 | 7875.0 | 1270.0 | 0 | 55.0 |
| 16546 | 49558 | 98198 | 6 | 2 | 1.0 | 2.0 | 1380.0 | 0 | 1.0 | 8925.0 | 175000.0 | 3 | 7440.0 | 1170.0 | 0 | 103.0 |
| 16547 | 146 | 98117 | 6 | 0 | 1.0 | 2.0 | 980.0 | 0 | 1.0 | 2130.0 | 400000.0 | 4 | 2800.0 | 980.0 | 0 | 96.0 |
| 16548 | 9396 | 98065 | 7 | 0 | 2.0 | 3.0 | 2190.0 | 0 | 2.0 | 7263.0 | 409000.0 | 3 | 5900.0 | 1950.0 | 0 | 7.0 |
| 16549 | 14466 | 98198 | 7 | 0 | 2.0 | 4.0 | 1630.0 | 0 | 2.0 | 6000.0 | 175000.0 | 3 | 6000.0 | 1780.0 | 0 | 23.0 |
| 16550 | 30056 | 98042 | 6 | 0 | 1.0 | 3.0 | 920.0 | 0 | 1.0 | 5525.0 | 191000.0 | 5 | 5330.0 | 840.0 | 0 | 46.0 |
| 16551 | 5824 | 98106 | 7 | 0 | 2.0 | 3.0 | 1780.0 | 0 | 1.0 | 6771.0 | 310000.0 | 3 | 6771.0 | 1780.0 | 0 | 24.0 |
| 16552 | 16712 | 98038 | 7 | 0 | 2.0 | 3.0 | 1060.0 | 0 | 2.0 | 3011.0 | 230000.0 | 3 | 3232.0 | 1340.0 | 0 | 19.0 |
| 16553 | 237 | 98075 | 10 | 0 | 2.0 | 3.0 | 2970.0 | 0 | 2.0 | 7857.0 | 800000.0 | 3 | 7857.0 | 3240.0 | 0 | 20.0 |